Fully Connected Feed-Forward Network

In this notebook we will play with Feed-Forward FC-NN (Fully Connected Neural Network) for a classification task: Image Classification on MNIST Dataset

RECALL

In the FC-NN, the output of each layer is computed using the activations from the previous one, as follows:

$$h_{i} = \sigma(W_i h_{i-1} + b_i)$$

where ${h}_i$ is the activation vector from the $i$-th layer (or the input data for $i=0$), ${W}_i$ and ${b}_i$ are the weight matrix and the bias vector for the $i$-th layer, respectively.
$\sigma(\cdot)$ is the activation function. In our example, we will use the ReLU activation function for the hidden layers and softmax for the last layer.

To regularize the model, we will also insert a Dropout layer between consecutive hidden layers.

Dropout works by “dropping out” some unit activations in a given layer, that is setting them to zero with a given probability.

Our loss function will be the categorical crossentropy.

Model definition

Keras supports two different kind of models: the Sequential model and the Graph model. The former is used to build linear stacks of layer (so each layer has one input and one output), and the latter supports any kind of connection graph.

In our case we build a Sequential model with three Dense (aka fully connected) layers, with some Dropout. Notice that the output layer has the softmax activation function.

The resulting model is actually a function of its own inputs implemented using the Keras backend.

We apply the binary crossentropy loss and choose SGD as the optimizer.

Please remind that Keras supports a variety of different optimizers and loss functions, which you may want to check out.


In [1]:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
#os.environ['THEANO_FLAGS'] = "device=gpu2"
from keras.models import load_model
from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras.optimizers import SGD

nb_classes = 10

# FC@512+relu -> DropOut(0.2) -> FC@512+relu -> DropOut(0.2) -> FC@nb_classes+softmax
# ... your Code Here


Using Theano backend.

In [2]:
# %load solutions/sol_221_1.py
from keras.models import Sequential
from keras.layers.core import Dense, Dropout
from keras.optimizers import SGD

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer=SGD(), 
              metrics=['accuracy'])

Data preparation (keras.dataset)

We will train our model on the MNIST dataset, which consists of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.

Since this dataset is provided with Keras, we just ask the keras.dataset model for training and test data.

We will:

  • download the data
  • reshape data to be in vectorial form (original data are images)
  • normalize between 0 and 1.

The binary_crossentropy loss expects a one-hot-vector as input, therefore we apply the to_categorical function from keras.utilis to convert integer labels to one-hot-vectors.


In [3]:
from keras.datasets import mnist
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
X_train /= 255
X_test /= 255

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

Training

Having defined and compiled the model, it can be trained using the fit function. We also specify a validation dataset to monitor validation loss and accuracy.


In [4]:
# You can train the network yourself or simply load a saved model :P, for now!!
#network_history = model.fit(X_train, Y_train, batch_size=1000, 
#                            nb_epoch=100, verbose=1, validation_data=(X_test, Y_test))
#model.save('example_MNIST_FC.h5')

In [5]:
model=load_model('example_MNIST_FC.h5')
model.summary()


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_1 (Dense)                  (None, 512)           401920      dense_input_2[0][0]              
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 512)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 512)           262656      dropout_1[0][0]                  
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 512)           0           dense_2[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 10)            5130        dropout_2[0][0]                  
====================================================================================================
Total params: 669706
____________________________________________________________________________________________________

Plotting Network Performance Trend

The return value of the fit function is a keras.callbacks.History object which contains the entire history of training/validation loss and accuracy, for each epoch. We can therefore plot the behaviour of loss and accuracy during the training phase.


In [6]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure()
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.plot(network_history.history['loss'])
plt.plot(network_history.history['val_loss'])
plt.legend(['Training', 'Validation'])

plt.figure()
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.plot(network_history.history['acc'])
plt.plot(network_history.history['val_acc'])
plt.legend(['Training', 'Validation'], loc='lower right')


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-d377be33ece5> in <module>()
      5 plt.xlabel('Epochs')
      6 plt.ylabel('Loss')
----> 7 plt.plot(network_history.history['loss'])
      8 plt.plot(network_history.history['val_loss'])
      9 plt.legend(['Training', 'Validation'])

NameError: name 'network_history' is not defined

In [7]:
import numpy as np
print(np.argmax(model.predict(X_test[5:10]),1))
print(y_test[5:10])


[1 4 9 6 9]
[1 4 9 5 9]

In [ ]:
# Can you write a snippet that finds a misclassified sample in X_train and
# displays the image, correct classification and your prediction